The question is as it is. I want to train a small language model but I couldn’t understand mathematically whether a random embedding model would decrease performance.
I wonder if the embedding sizes are set in the code before being fed to the model? Otherwise, I think there should be an error in the matrix size.